A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
نویسندگان
چکیده
We present a new speech database containing 18.5 hours of annotated radio broadcasts in the Frisian language. Frisian is mostly spoken in the province Fryslân and it is the second official language of the Netherlands. The recordings are collected from the archives of Omrop Fryslân, the regional public broadcaster of the province Fryslân. The database covers almost a 50-year time span. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. Considering the longitudinal and code-switching nature of the data, an appropriate annotation protocol has been designed and the data is manually annotated with the orthographic transcription, speaker identities, dialect information, code-switching details and background noise/music information.
منابع مشابه
Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech
In this paper, we present a new longitudinal and bilingual broadcast database designed for speaker clustering and textindependent verification research. The broadcast data is extracted from the archives of Omrop Fryslân which is the regional broadcaster in the province of Fryslân, located in the north of the Netherlands. Two speaker verification tasks are provided in a standard enrollment-test ...
متن کاملOpen Source Speech and Language Resources for Frisian
In this paper, we present several open source speech and language resources for the under-resourced Frisian language. Frisian is mostly spoken in the province of Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are Frisian-Dutch bilingual and often code-switch in daily conversations. The resources presented in this paper include a code-switching speech da...
متن کاملAge of acquisition and naming performance in Frisian-Dutch bilingual speakers with dementia
Age of acquisition (AoA) of words is a recognised variable affecting language processing in speakers with and without language disorders. For bi- and multilingual speakers their languages can be differentially affected in neurological illness. Study of language loss in bilingual speakers with dementia has been relatively neglected. Objective We investigated whether AoA of words was associated...
متن کاملInvestigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech
In this paper, a code-switching automatic speech recognition (ASR) system built for the Frisian language is described. Frisian is mostly spoken in the province Fryslân which is located in the north of the Netherlands. The native speakers of Frisian are mostly bilingual and often code-switch in daily conversations due to the extensive influence of the Dutch language. In the scope of the FAME! Pr...
متن کاملExploiting Untranscribed Broadcast Data for Improved Code-Switching Detection
We have recently presented an automatic speech recognition (ASR) system operating on Frisian-Dutch code-switched speech. This type of speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we extend this work by using some raw broadcast data to improve multilingually trained deep neural networks (DNN) that have been trained on 11.5 ...
متن کامل